With the powerfulness of convolution neural networks (CNN), CNN based facereconstruction has recently shown promising performance in reconstructingdetailed face shape from 2D face images. The success of CNN-based methodsrelies on a large number of labeled data. The state-of-the-art synthesizes suchdata using a coarse morphable face model, which however has difficulty togenerate detailed photo-realistic images of faces (with wrinkles). This paperpresents a novel face data generation method. Specifically, we render a largenumber of photo-realistic face images with different attributes based oninverse rendering. Furthermore, we construct a fine-detailed face image datasetby transferring different scales of details from one image to another. We alsoconstruct a large number of video-type adjacent frame pairs by simulating thedistribution of real video data. With these nicely constructed datasets, wepropose a coarse-to-fine learning framework consisting of three convolutionalnetworks. The networks are trained for real-time detailed 3D facereconstruction from monocular video as well as from a single image. Extensiveexperimental results demonstrate that our framework can produce high-qualityreconstruction but with much less computation time compared to thestate-of-the-art. Moreover, our method is robust to pose, expression andlighting due to the diversity of data.
展开▼